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1.  INTRODUCTION 


— In  this  paper  we  study  Aalen’s  (1980)  additive  risk  model  for  the  regression  analysis 
of  censored  survival  datgj  Let  A ;{t)  =  Aj{t,Yi)  denote  the  hazard  function  at  time  t  for 
subject  i  whose  covariates  are  given  by  the  p-vector  Yi  =  YirY-  Aalen’s  model 

stipulates  that 

M‘)  =  E  «.•(*)»«  =  y/«w  (ii) 

3-1 

where  a  =  (c*i, . . . ,  ap)'  is  an  unknown  vector  of  hazard  functions.  More  generally,  the 
covariates  can  be  time  dependent,  as  considered  in  section  5. 

""'a*  The  additive  risk  model  provides  a  useful  alternative  to  Cox’s  (1972)  proportional 
hazards  model  when  large  sample  size  makes  its  application  feasible.  It  is  capable  of  provid¬ 
ing  detailed  information  concerning  the  temporal  influence  of  each  covariate.  The  temporal 
influences  of  the  covariates  are  not  assumed  to  be  proportional  as  they  are  in  Cox’s  model. 
Buckley  (1984)  Has  pointed  out  that  additive  risk  models  are  biologically  more  plausible 
than  proportional  hazard  models.  Also,  the  use  of  the  proportional  hazards  model  when  the 
true  model  is  additive  risk  has  been  found  by  O’Neill  (1986]to  result  in  serious  asymptiotic 
bias. 


Aalen  (1980)  introduced  estimators  for  the  vector  of  integrated  hazard  functions, 
A(t)  =  Jq  a(s)ds,  which  use  continuous  data  (containing  the  exact  values  of  failure  and 
censoring  times).  These  estimators  generalize  the  well-known  Nelson- Aalen  estimator,  the 
natural  estimator  in  the  case  of  one  covariate.  However,  except  in  the  case  of  one  covariate 
(Aalen,  1978),  the  asymptotic  theory  was  not  fully  developed.  One  possible  form  of  these 
estimators  was  motivated  by  a  formal  least  squares  principle.  This  estimator,  defined  pre¬ 
cisely  in  (2.3)  and  (5.4),  is  refered  to  here  as  Aalen’s  least  squares  estimator.  Aalen  observed 
that  this  estimator  probably  gives  reasonable  estimates  and  he  applied  it  to  analysis  of  data 
from  the  Veterans  Administration  Lung  Cancer  Study  Group. 

The  first  purpose  of  the  present  paper  is  to  apply  the  additive  risk  model  to  the 
analysis  of  grouped  data  in  which  only  the  person-years  at  risk  and  number  of  uncensored 


■j?  deaths  over  successive  time  intervals,  tabulated  for  various  levels  of  the  covariates,  are 
available.  This  kind  of  data  typically  arises  in  epidemiological  cohort  studies  involving  the 
follow-up  of  large  population  groups  over  many  years,  see  Breslow  (1986).  Our  approach  is 
to  use  an  estimator,  constructed  using  the  method  of  sieves  (Grenander,  1981),  for  which  an 
asymptotic  distribution  theory  was  developed  by  McKeague  (1987).  This  estimator,  called 
the  integrated  histogram  sieve  estimator,  requires  only  grouped  data. 


In  section  2  we  describe  the  various  estimators  and  confidence  bands,  give  a  heuristic 
motivation  for  the  integrated  histogram  sieve  estimator  and  show  that  it  is  approximately 
unbiased.  The  results  of  a  simulation  study  are  reported  in  section  3.  In  section  4  we  apply 
the  additive  risk  model  to  the  analysis  of  grouped  data  on  the  incidence  of  cancer  mortality 
among  Japanese  atomic  bomb  survivors. 


The  second  purpose  of  this  paper  is  to  derive  the  asymptotic  distribution  of  Aalen’s 
least  squares  estimator..  This  is  done  in  Section  5.  It  turns  out  that,  appropriately  normal¬ 
ized,  Aalen’s  least  squares  estimator  and  the  integrated  histogram  sieve  estimator  have  the 
same  asymptotic  distribution.  In  comparing  the  two  estimators  this  indicates  that  (asymp¬ 
totically)  there  is  no  loss  in  using  the  grouped  data  when  the  continuous  data  is  unavailable. 
We  conclude  section  5  with  a  discussion  of  weighted  least  squares  estimators  for  the  additive 
risk  model.  . . '  f  r  .  „ 


2.  ESTIMATORS  AND  CONFIDENCE  BANDS 


We  shall  use  the  random  censorship  model  in  which  the  ith  individual’s  failure  time 
Xi  is  assumed  to  be  an  absolutely  continuous  random  variable  conditionally  independent 
of  the  censoring  time  given  the  covariate  vector  Y{.  Let  X,  =  min(X,,Wi)  and  Si  = 
I(X{  <  W»)  denote  the  time  to  the  end-point  event  and  the  indicator  for  noncensorship 
respectively.  Assume  that  the  observation  triples  (Xj,6i,Vi),*  =  l,...,n  are  i.i.d.  and  the 
conditional  hazard  function  A»(t)  of  X»  given  Y»  satisfies  the  model  (1.1). 


Let  Ji,...,Jd  be  intervals  which  partition  the  follow-up  period  [0, T\  over  which 
estimation  of  ai,...,ap  or  Ai,...,Ap  is  desired.  Specifically,  let  Jr  =  (br,6r  +  tT\, 


r  =  1  The  total  time  that  the  ith  individual  is  observed  to  be  at  risk  in  inter¬ 

val  Ir  is  given  by 

Tir  =  I  I{Xi  >  t)dt. 

j  Jr 

The  indicator  that  the  ith  individual  undergoes  an  uncensored  failure  in  Ir  is  given  by 

6ir  =  6iI(XieIr). 

Let  Cr  be  the  p  x  1  vector 

n 

Cr  =  Y,6irYi 

»= 1 

and  let  Dr  be  the  p  x  p  matrix 

n 

Dr  =  £(v<i7)rir. 

»•= 1 

In  the  sequel  we  use  the  following  notational  convention:  for  any  square  matrix  K,  K~l 
denotes  the  inverse  of  if  if  if  is  invertible,  the  zero  matrix  otherwise.  The  histogram  sieve 
estimator  is  defined  by  a(t)  =  ar  when  t  e  Ir,  where  ar  =  (ary)  is  the  p  x  1  vector 
given  by  ar  =  D~1Cr.  Note  that  the  histogram  sieve  estimator  can  be  evaluated  from  the 
grouped  data  consisting  of  the  total  time  at  risk  and  the  number  of  uncensored  failures  in 
each  interval  Ji, . . . ,  Ja  tabulated  for  all  realized  levels  of  the  covariates. 

The  form  of  the  histogram  sieve  estimator  can  be  motivated  by  the  following  argu¬ 
ment.  Suppose  the  time  intervals  Ji,  I2,.  •  • ,  Id  are  short  enough  so  that  the  hazard  rates 
A* (<)  for  all  subjects  can  be  regarded  as  constant  within  each  interval  Jr  =  (6r,6r-t-^r]>  that 
is,  we  can  take  a(t)  =  ar  for  t  e  Ir.  This  implies  A i(t)  =  A^r  for  t  e  Ir  where  A;r  =  V/ar. 

Under  this  condition  we  can  show  that  E[DTar\JT)  =  E{Cr\7r)  where  JT  is  the 
<7—  field  generated  by  the  covariates  Ylf . . . ,  Yn  and  the  events  {.Xi  >  6r},  t  =  1, . . . ,  n .  If  the 
number  of  observed  failures  in  Ir  is  large,  the  Law  of  Large  Numbers  will  imply  Drar  «  CT . 
This  suggests  the  estimator  ar  =  DllCr  which  is  the  histogram  sieve  estimator. 

To  show  that  E{Drar\Jr)  =  E{Cr\Jr)  we  shall  prove  the  stronger  result  E{Drar \$r) 
=  E(Cr \Qr)  where  Qr  is  the  larger  (7-field  generated  by  Ir  and  the  censoring  times 
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Wi, . . .  ,Wn .  From  the  definitions  we  have 


£(Cr|£r)  =  and 

»=1 

n 

E(Drar\9r)  =Y,YilY‘*T)E{Txr\5r). 

i=l 

Let  ZiT  =  (VFj  —  br)  A  tr .  Given  Xx  >  bT ,  the  time  Ttr  is  an  exponential  random  variable 
(with  parameter  A,r)  truncated  at  ZiT  so  that  we  obtain 

E(Tir\gr)  =  Aty1[l  -  exp  (-A irZir)]I{Xi  >  br). 


Similarly 

E(Sir  \9r)  =  P{Xi  >  br,  Tir  <  Zir\9r} 

=  [1  —  exp  (-\irZir)}I{Xi>br). 

Substituting  these  in  the  above  formulas  for  E[Cr\9r )  and  E{Drar\9r)  and  noting  that 
y/ar  =  Air ,  we  obtain  the  desired  equality. 

Let  mr  be  the  number  of  subjects  at  risk  in  Ir ,  this  is  mr  =  YliHX,  >  br) .  The 
histogram  sieve  estimator  ar  will  be  approximately  unbiased  as  long  as  mr  is  large  and 
A irlr  is  small  for  all  the  probability  (conditional  on  Xx  >  br)  of  failure  during  Ir  is  small 
for  all  subjects  i.  In  showing  this  we  use  the  notation 

Fr  =  E(Dr\9r),  ipr  =  Dr  -  Fr  and  Ar  =  CT  -  Frar. 

Both  E(xpr\9r)  =  0  and  E(kr\9r)  =  0  with  the  second  fact  following  from 

Frar  =  E(Drar\9r)  —  F(Cr\9r) 

which  was  demonstrated  previously.  Explicit  formulas  for  rpr  and  Ar  are 

n  n 

4>r  =  J2  YiYWr  and  Ar  =  ^  Y,6*r 
1=1  »=1 

with  T*r  =  Tir  -  E{Tir\9r)  and  61  =  6ir  -  E(6ir\9r )• 
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Since  A »r£r  is  small,  it  is  easy  to  see  that  is  small  with  high  probability.  This 
implies  tpr  is  small  relative  to  Fr  and  justifies  the  series  expansion 

D~l  =  (Fr  +  W  )-l=F~l-  F~ 1  %l>rF~ 1  +  F-'trF-'jrF-1 

Using  this  expansion  and  CT  =  Frar  +  Ar  we  find  (ignoring  higher  order  terms) 

E(D~lCT\5r)  ««r  +  E(F~ li>rOr\$r)  ~  E(F~ 'W 1  Ar|£r) . 

Both  of  the  bias  terms  are  0(m~l).  For  example,  writing  Vv  and  Ar  as  sums  and  using 
the  independence  of  the  subjects  gives 

n 

E(F^rF~lkr\5r) 

i=l 

This  sum  contains  mr  nonzero  terms.  We  expect  the  norm  ||  F~ 1  ||  to  be  0(m7x)  so  that 
each  nonzero  term  in  the  sum  will  be  0(m~ 2) .  Thus  the  sum  is  0(mr  •  m~2)  =  0(m~1). 
The  other  bias  term  is  handled  similarly. 

Note,  we  are  assuming  above  that  the  hazard  rates  for  all  subjects  are  constant 
within  each  interval  Ir.  If  the  hazard  rates  vary  within  the  intervals,  this  will  introduce  an 
additional  source  of  bias. 

The  integrated  histogram  sieve  estimator  is  denoted  .A(t)  =  a(s)ds ,  where  integra¬ 
tion  of  a  is  carried  out  componentwise.  Let  A(t)  =  j^a(s)ds.  Under  conditions  (Cl)-(C4) 
in  section  5.2,  given  that  the  partition  Ji,..., Id  becomes  finer  as  n  — ♦  oo  in  such  a  way 
that  y/n  max(fi,. . . , £j)  — ►  0  and  n  min(£i,. . .  ,£j)  — »  oo,  then  an  asymptotic  100(1  -  a)% 
confidence  band  for  Aj  is  given  by 

M 1)  ±  c„n-'/’6AT)'IHl  +  5^).<  <  [0,T|  (2.1) 

bjV  ) 

where  ca  is  the  upper  a  quantile  of  the  distribution  of  suptt|0  |i?0(t)|,  B°  is  the  Brownian 
bridge  process, 

G,(t)  =  [  SA*)**  (2-2) 

Jo 


where  gj(t)  =  gr]-  if  t  e  Ir  and 


p  p 


9rj  =  n^2&ru^2  H  Mr{u,V,w){Dr  l)y v[Dr  1)}W, 


u—  1  v  —  1  tn  —  1 


Mr{u,Vtw)  —  ^  'YxuYjyYiwTxr  • 


»=1 


A  table  for  the  distribution  of  supt£|0  ^  |fi°(t)|  has  been  given  by  Hall  and  Wellner  (1980). 
An  asymptotic  100(1  —  a)%  confidence  interval  for  A3(t) ,  at  fixed  t  e  [0,  T],  is  given  by 


AJ[t)±za/2n  1/2Gy(t)1/2, 


where  zaj2  is  the  upper  a/ 2  quantile  for  the  standard  normal  distribution. 
Aalen’s  least  squares  estimator  is  defined  by 


M‘)  =  E 

6i  =  1 
Xi<t 


(2.3) 


where  the  summation  is  over  individuals  i  whose  failure  times  are  uncensored  and  less  than 
or  equal  to  t,  Di  is  the  p  x  p  matrix 


bi=  E  y‘y» 

Xk>Xi 


(2.4) 


where  the  summation  is  over  individuals  k  at  risk  at  time  X{ . 


Under  conditions  (Al)-(A3)  in  section  5.1  an  asymptotic  100(1  —  a)%  confidence  band  for 
Aj  is  given  by 

Ay(l)  ±  +  |^),l  <  |0.r] 

where  ca  is  defined  above  and 


(2.5) 


~  XT 


(2.6) 


Si=  i 
Xi<t 


p  p 


gi}  =  nJ2  (i>rlYi) '^2Mi{u,v,w){b~l)jv{b~l)jw, 


U=1 


v  =  l  w  —  l 
6 


A/i(u,v,uj)  —  ^  ^  YkuYkvYkw 

xk>xt 

An  asymptotic  100(1  —  a)%  confidence  interval  for  Aj(t) ,  at  fixed  t  e  [0,  T],  is  given  by 
Aj{t)  ±  za/2n~l/2G3{t )1/2. 

Note  that,  unlike  the  confidence  intervals,  the  confidence  bands  given  above  are  de¬ 
pendent  on  the  choice  of  T.  As  T  increases  the  band  widens  at  all  points.  It  also  becomes 
more  unreliable  since  the  effective  sample  size  fix  =  #{t  :  Xi  >  T}  for  estimating  the 
asymptotic  variance  at  time  T  is  getting  smaller.  In  practice  we  found  it  reasonable  to  set 
T  so  that  nx  is  at  least  10%  of  the  sample  size.  This  is  the  case  for  the  simulated  data  in 
section  3  and  for  the  atomic  bomb  survivor  data  in  section  4. 

3.  ADDITIVE  RISK  MODEL  SIMULATIONS 

In  order  to  evaluate  the  performance  of  the  confidence  intervals  and  bands  proposed 
for  grouped  data,  a  Monte-Carlo  experiment  was  performed  to  see  whether  their  asymptotic 
properties  take  effect  under  sample  sizes,  grouping  and  censoring  found  in  typical  applica¬ 
tions.  The  simulation  model  used  p  =  2  covariates  with  i.i.d.  uniform  distributions  on  the 
lattice  {£,r  =  1,...,8}  and  corresponding  hazard  functions  ai(t)  =  l,a2(t)  =  t,t  >  0. 
The  censoring  time  was  independent  of  the  failure  time  and  exponentially  distributed  with 
parameter  j3 ,  for  various  values  of  /?.  The  follow-up  period  was  [0, 1].  The  sieve  intervals 

lx _ ,  Id  were  of  equal  length.  For  various  combinations  of  n,  d,  and  /?,  500  samples 

of  grouped  data  were  generated  via  the  random  censorship  model.  For  each  of  these  500 
samples  we  constructed  the  asymptotic  95%  confidence  bands  (2.1)  for  the  true  integrated 
hazard  functions  Ax(t)  =  t  and  A2(t)  =  ^ t 2  on  the  interval  [0,1].  The  proportion  among 
the  500  bands  that  contained  Aj  on  the  interval  [0,  l]  was  then  determined  (for  j=l  and 
2).  The  sample  size  n  was  set  to  1000  and  4000.  The  sieve  dimension  d  was  set  to  8,  16, 
and  64.  The  censoring  parameter  0  took  the  values  .3  and  1.5  which  amounted  to  28%  and 
68%  censoring  prior  to  the  end  of  follow-up,  respectively.  With  /?  =  . 3  ( /?  =  1.5)  there  were 
on  average  33%  (10%)  surviving  beyond  the  end  of  follow-up. 


The  results  are  displayed  in  Tables  1  and  2  below.  The  random  number  generator 
used  the  same  initial  seed  for  the  runs  marked  a  and  different  seeds  for  the  runs  marked  b 
and  c.  Runs  with  the  same  initial  seed  have  identical  failure  times. 

Table  1.  Observed  Coverage  Probabilities  with  500  Simulations 
using  Exponential  Censoring  (/3  =  0.3)  —  28%  Censoring. 


d 

n  = 

Covariate  1 

1000“ 

Covariate  2 

n  = 

Covariate  1 

40006 

Covariate  2 

8 

.976 

.982 

.982 

.966 

16 

.972 

.978 

.964 

64 

.958 

.966 

.960 

Table  2.  Observed  Coverage  Probabilities  with  500  Simula¬ 
tions  using  Exponential  Censoring  (/?  =  1.5)  —  68%  Censoring. 


d 

n  = 

Covariate  1 

1000° 

Covariate  2 

n  = 

Covariate  1 

4000c 

Covariate  2 

8 

.978 

.982 

.982 

.982 

16 

.972 

.980 

.984 

64 

.958 

.974 

.974 

.980 

Observe  from  Tables  1  and  2  that  the  bands  appear  to  be  conservative  but  as  d 
increases  the  coverage  probabilities,  on  the  whole,  get  closer  to  their  nominal  value  of  .95. 
This  is  true  for  both  covariates  under  both  light  (28%)  and  heavy  (68%)  censoring.  On  the 
other  hand,  in  all  the  simulations  we  carried  out,  the  coverage  probabilities  for  the  pointwise 
confidence  intervals  were  very  close  (not  significantly  different)  to  .95,  a  typical  case  being 
given  in  Table  3  below  (see  column  5). 


Table  3.  True  and  Average  Estimated  Cumulative  Hazard 
Function,  Standard  Deviation  of  the  Simulated  Estimates  and 
confidence  Interval  Coverage  Probabilities  with  500  Simulations, 
28%  Censoring  (/?  =  0.3),  n  =  1000,  d  =  8,  Covariate  1. 


Time 


True 

integrated 

hazard 


Average  of 

simulated 

estimates 


Standard  deviation 
of  simulated 
estimates 


Proportion  of 
confidence  intervals 
containing  the 
true  value 


Also  observe  from  Table  3  that  the  estimators  appear  to  be  unbiased,  in  support  of 
the  heuristic  arguments  in  section  2. 

Figures  1-4  contain  plots  of  the  estimators  under  light  (28%)  and  heavy  (68%)  cen¬ 
soring  for  d=8  and  d=64.  The  sample  size  n  was  set  to  2000  and  the  random  number 
generator  used  the  same  initial  seed  for  all  runs,  so  the  failure  times  used  in  each  run  are 
identical. 

[Insert  Figures  1-4  here] 


As  expected,  the  bands  are  wider  under  the  heavier  censoring;  compare  Figures  3 
and  4  with  Figures  1  and  2.  Also,  it  appears  that  the  estimator  which  uses  d=8  (in  Figures 
1  and  3)  is  a  smoothed  version  of  the  estimator  which  uses  d=64  (in  Figures  2  and  4). 
Although  the  estimators  are  adequate  in  each  case,  notice  that  under  heavy  censoring  the 
estimator  which  uses  d=64  (in  Figure  4)  oscillates  considerably  when  t  is  close  to  1.  This 
is  due  to  the  very  small  number  of  observed  failures  (averaging  less  than  5)  for  each  of 


the  sieve  intervals  in  this  region.  A  more  flexible  procedure  would  be  to  merge  adjoining 
intervals  which  contain  very  few  observed  failures.  For  contrast,  in  Figure  2  notice  that  the 
oscillation  is  negligible.  This  is  because  of  the  lower  censoring  rate  which  gives  an  average 
of  more  than  10  observed  failures  per  interval  near  t=l. 

4.  APPLICATION  TO  THE  ANALYSIS  OF  CANCER  MORTALITY 
AMONG  JAPANESE  ATOMIC  BOMB  SURVIVORS 

The  Radiation  Effects  Research  Foundation  (RERF)  in  Hiroshima,  Japan,  has  fol¬ 
lowed  since  1950  a  group  of  over  100,000  atomic  bomb  survivors.  Data  on  these  survivors 
is  the  primary  source  of  information  on  the  epidemiologic  effects  of  ionizing  radiation.  The 
National  Research  Council  (1980)  report  and  the  report  by  Kato  and  Schull  (1982)  contain 
detailed  analyses  of  these  data.  However,  as  noted  by  Pierce  and  Preston  (1984),  a  diffi¬ 
culty  with  the  methods  used  in  these  reports  is  that  they  do  not  make  explicit  allowance 
for  temporal  variation  in  risks;  they  simply  average  the  risk  over  the  follow-up  period. 

Pierce  and  Preston  (1985)  analyzed  cancer  mortality  among  the  atomic  bomb  sur¬ 
vivors  using  a  parametric  additive  risk  model  which  they  called  the  excess  risk  model.  They 
found  that  background  and  excess  rates  of  cancer  mortality  vary  markedly  with  age  at 
exposure  and  time  since  exposure. 

The  approach  taken  here  is  to  fit  a  nonparametric  additive  risk  model  of  the  form 
(1.1)  for  each  of  4  cohorts  defined  by  the  age  at  exposure  intervals  0-9,  10-19,  20-34  and 
35-49  years  of  age  at  time  of  bomb.  The  time  variable  t  is  time  since  exposure,  ranging  from 
5  to  37  years  over  the  follow-up  period  from  1950  to  1982.  The  only  censoring  prior  to  the 
end  of  follow-up  is  that  due  to  other  (non-cancer)  causes  of  death.  Summary  information 
for  each  cohort  is  give  in  Table  4. 


Table  4.  Cohort  Sizes,  Summary  Mortality  Figures  and  Cen¬ 
soring  Prior  to  End  of  Follow-up. 


Age  at 
exposure 

Cohort  size* 

Deaths  due  to 
Cancer  excluding 
Leukemia 

Deaths  from 
all  causes 

Censoring 

0-9 

18416 

93 

728 

87% 

10-19 

19242 

349 

1715 

80% 

20-34 

17694 

949 

3075 

69% 

35-49 

20916 

2788 

11234 

75% 

*  Approximate,  having  been  estimated  from  the  grouped  data. 


Three  covariates  are  used.  For  individual  i  they  are:  Yu  =  indicator  (male),  Y i2  = 
indicator  (female),  1^3  =  dose  (in  units  of  100  rads).  The  hazard  functions  and  a2  are 
the  background  cancer  mortality  rates  for  males  and  females  respectively.  The  third  hazard 
function  a3  is  the  excess  cancer  mortality  rate  per  100  rads  of  radiation  exposure. 

In  the  data  provided  to  us  by  the  RERF  time  since  exposure  is  grouped  into  eight 
4-year  intervals:  5-9,...,  33-37  years  since  exposure.  It  is  natural  to  use  these  eight  time 
intervals  as  the  partition  of  the  follow-up  period  in  the  evaluation  of  the  sieve  estimators.  As 
in  Pierce  and  Preston  (1985)  the  dose  variable  is  taken  as  the  midpoint  of  one  of  the  six  dose 
groups:  0,  1-50,  50-100,  100-200,  200-300,  >300  rads  with  dose=400  for  dose  >300.  Average 
doses  for  the  dose  groups  are  not  used  because  of  current  reevaluation  of  the  dosimetry. 
Also,  the  analysis  is  limited  to  the  epithelial  cancer  mortality,  in  which  leukemia  mortality 
is  excluded. 

Figures  5-8  indicate  our  estimates  and  95%  confidence  intervals  and  bands  for  the 
background  and  excess  cumulative  cancer  mortality  rates  as  functions  of  time  since  exposure. 
The  estimates  of  Pierce  and  Preston  are  also  plotted.  All  estimates  are  given  in  units  of 
deaths  per  1000  persons  at  risk.  Pierce  and  Preston’s  parametric  model  for  the  cancer 
mortality  rate  A (t)  is  given  by 

A(t)  =  exp  {t/0c  +  vu  +  log(e  +  t)}  +  0d  exp  {p0c  +  pt  log(t)},  (4.1) 


where  e  =  midpoint  of  age  at  exposure  interval,  s  =  sex,  and  d  =  dose.  The  first  and  second 
terms  in  (4.1)  represent  the  background  and  excess  cancer  mortality  rates  which  Pierce  and 
Preston  fitted  by  maximum  likelihood.  We  have  integrated  these  to  provide  a  comparison 
with  the  integrated  histogram  sieve  estimates. 

(Insert  Figures  5-8  here] 

From  Figures  5-8  we  see  that  there  is  a  significant  dose  effect  (i.e.  the  band  for 
dose  does  not  contain  the  zero  function)  for  all  cohorts.  This  is  despite  the  fact  that  the 
confidence  bands  with  eight  sieve  intervals  are  conservative  according  to  our  simulation 
results  in  section  3.  In  interpreting  Figures  5-8  bear  in  mind  that  the  vertical  scales  are 
different  for  each  cohort.  Despite  appearances,  the  dose  effect  bands  have  roughly  the  same 
width  for  each  cohort. 

On  the  whole  our  estimates  are  in  agreement  with  those  of  Pierce  and  Preston  (1985). 
We  also  observe  that  the  relative  risk  (dose  effect  vs.  background)  decreases  sharply  with 
age  at  exposure.  Our  estimates  for  the  dose  effect  are  somewhat  higher  than  Pierce  and 
Preston’s,  but  their  estimates  are  still  within  our  95%  bands.  However,  there  is  a  signifi¬ 
cant  difference  between  our  estimates  and  Pierce  and  Preston’s  for  the  female  background 
mortality  rate  in  all  but  the  0-9  years  of  age  cohort. 

The  data  used  for  the  analyses  in  this  section  are  contained  in  the  file 
RlOCANCR.DAT  which  is  described  in  the  Life  Span  Study  Report  10,  Part  I  published 
by  RERF.  The  file  was  supplied  to  us  by  RERF  which  is  a  private  foundation  that  is  funded 
equally  by  the  Japanese  Ministry  of  Health  and  Welfare  and  the  U.S.  Department  of  Energy 
through  the  U.S.  National  Academy  of  Sciences.  The  conclusions  reached  in  this  paper  are 
those  of  the  authors  and  do  not  necessarily  reflect  the  scientific  judgement  of  RERF  or  its 
funding  agencies. 

5.  ASYMPTOTIC  RESULTS 

In  this  section  we  give  a  general  formulation  of  the  asymptotic  results  which  underlie 
the  confidence  bands  in  section  2.  Definitions  of  the  martingale  concepts  used  in  this  section 
can  be  found  in  the  review  paper  of  Andersen  and  Borgan  (1985). 


First  we  look  at  the  basic  counting  processes  and  martingales  involved  in  the  random 
censorship  model.  Let  Ni(t)  be  the  indicator  of  an  uncensored  failure  for  subject  t  prior  to 
time  f, 

Ni(t)  =  I(Xi  <t,6i  =  1). 

Aalen  (1978)  showed  that  the  counting  process  Ni(t)  can  be  written  in  the  form 

p  rt 

N<(t)  =  E  /  ctMYijWds  +  MiW, 

3=1  J° 

where  Yij[t)  =  YijI{Xi  >  t)  and  Mi  is  a  square  integrable  martingale  with  respect  to  the 
filtration  7t  =  o(Ni(s),  Yiy(s+),  0  <  s  <  t,  t  >  1 ,/  =  Note  that  no  two  of  the 

counting  processes  Ni,...,Nn  jump  simultaneously. 

More  generally,  let  N(t)  =  (JVi(t), . . . , Nn(t)y,  t  e  [0,1]  be  a  multivariate  counting 
process  with  respect  to  a  right-continuous  filtration  ( 7t ) ,  i.e.  N  is  adapted  to  the  filtration 
and  has  components  Ni  which  have  sample  paths  which  are  non-decreasing,  right-continuous 
step  functions,  zero  at  time  zero,  and  with  jumps  of  unit-size.  Moreover,  suppose  that  no 
two  components  jump  simultaneously.  Let  A  be  the  compensator  of  N ,  so  that  N  ~  A  +  M 
where  M  —  (Mi, . . .  ,M„)'  and  Mi, . . .  ,Mn  are  local  martingales.  Aalen  (1980)  introduced 
the  model 

A(0  =  /  Y(s)a(s)ds  (5.1) 

Jo 

where  a  =  (ai,...,ap)'  is  a  vector  of  unknown  nonrandom  integrable  functions,  Y (i)  = 
(yiy(t))  is  an  n  x  p  matrix  of  covariate  processes,  with  Yiy(£)  representing  the  jth  covariate 
for  the  ith  subject  a  time  t.  The  covariate  processes  are  assumed  to  be  predictable  and 
locally  bounded  (which  is  the  case  for  instance  if  they  are  left-continuous  with  right-hand 
limits  and  adapted).  This  model  allows  much  more  general  censoring  mechanisms  (quite 
arbitrary  apart  from  being  predictable)  than  the  one  considered  in  section  2.  Unlike  Aalen 
we  do  not  assume,  unless  explicitly  stated,  that  ^^(l)  <  oo  (in  which  case  M;  is  a  square 
integrable  martingale),  only  that  Nj(l)  <  oo  almost  surely. 

Aalen  (1980)  discussed  estimators  A*  of  A  of  the  form 

A*(t)=  [*  Y  '(s)dN(s),  (5.2) 

Jo 
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where  Y  (s)  is  a  predictable  generalized  inverse  of  Y  (s) .  In  the  case  p  —  1 ,  the  choice 


[Y  («))i*  =  *fci(.s))  \  *  = 

k=l 

(where  1/0  =  0)  gives  the  Nelson- Aalen  estimator  for  which  a  general  asymptotic  theory 
was  derived  by  Aalen  (1978).  The  powerful  martingale  techniques  used  in  this  theory  allow 
the  conventional  i.i.d.  structure  to  be  superceded;  only  very  weak  asymptotic  stability  and 
Lindeberg  conditions  need  to  be  imposed  on  the  (single)  covariate,  see  Andersen  and  Borgan 
(1985,  p.  136). 

The  case  p  >  1  is  more  complicated.  First,  it  is  not  clear  which  generalized  inverse 
Y~  of  Y  to  use.  Aalen  (1980,  Remark  1)  considered 

Y-(s)  =  (Y'(s)Y(s))-lY'(s)  (5.3) 

which  he  motivated  by  a  formal  least  squares  principle  and  noted  that  the  resulting  estimator 

a(<)  =  /"(y'(.)y(«))-‘y'(.)<w(.)  (5.4) 

JO 

probably  gives  reasonable  (though  not  optimal)  estimates  of  A.  Second,  the  problem  of 
finding  the  asymptotic  distribution  of  the  general  estimator  of  the  form  (5.2)  is  intractable. 

5.1.  Aalen’s  least  squares  estimator 

Despite  the  difficulties  alluded  to  above  it  is  possible  to  obtain  the  asymptotic  distri¬ 
bution  of  A  in  a  straightforward  way  by  applying  Rebolledo’s  Central  Limit  Theorem  for 
local  square  integrable  martingales  in  the  form  given  by  Andersen  and  Gill  (1982,  Theorem 
1.2).  Let  D[0,  lp  be  the  product  of  p  copies  of  Z?[0,  l]  and  equip  it  with  the  Skorohod 
product  topology.  Denote 

=  (5-5) 

»=i 

*>«(<)  =  ^ 

TV 

1=1 


(5.6) 


CONDITIONS 


(Al)  (Asymptotic  stability).  For  j,k,l  =  there  exist  bounded  functions  Kjk 

and  Rjkt  defined  on  [0,1]  such  that 

sup  \Kjk(t)  -  Kjk{t) |  £  0 
t«[o,i] 

sup  |A/u(t)  -  /Zju(OI  ^  °- 
(A2)  (Lindeberg  condition).  For  each  j  =  1, . . .  ,p 

sup  |y»j(t)|  0. 

(A3)  (Asymptotic  regularity  condition).  The  pxp  matrix  function  K(t)  =  [Kjk[t]) 
in  (Al)  satisfies 

inf  det  Kit)  >  0. 


THEOREM  5.1.  Under  conditions  (Al)-(A3) 

s/n(A  —  A)  m  in  D\0,  l]p 

where  m  is  a  p-variate  continuous  Gaussian  martingale  with  m(0)  =  0  and  covariance 
functions 

Cov(mj(t),mfc(f))  =  G]k(t) 

where 


°A‘)  =  E  E  E  /'*«•  w(«-‘w),.(K'-,w)^o„w<is.  (5.7) 

u=lo  =  ltu  =  l 

The  following  theorem  shows  that 

Gik{t)  =  JZ  £  £  ^[•)(K-li»))MK-l{s))kwdAM  (5.8) 

U=1  W=1  Ul  =  l  '  ® 

is  a  uniformly  consistent  estimator  of  the  covariance  matrix  of  the  limiting  Gaussian  mar¬ 
tingale.  In  the  survival  analysis  context  G}J  is  given  by  (2.6). 


THEOREM  5.2.  Under  conditions  (A1)-(A3),  for  each  j,k  =  l,. . .  ,p 


sup  \G}k[t)  -  Gjk(t)\^0. 

te|0,l] 

Combining  Theorems  5.1  and  5.2  with  standard  results  from  Billingsley  (1967,  Theorems 
4.1  and  5.1)  shows  that  (2.5)  is  an  asymptotic  100(1  -  a)%  confidence  band  for  Ay.  The 
transformation  to  the  Brownian  bridge  which  is  used  in  deriving  this  confidence  band  is 
described  by  Andersen  and  Borgan  (1985,  p.  114). 

The  Lindeberg  condition  (A2)  holds  if  the  covariates  are  bounded  by  random  variables 
having  a  bounded  rth  moment  for  some  r  >  2 ,  c.f.  the  discussion  of  Andersen  and  Gill  (1982, 
p.1110)  in  the  context  of  Cox’s  proportional  hazards  model. 

The  following  theorem  gives  conditions  under  which  (Al)-(A3)  hold  in  the  i.i.d.  case 
in  which  the  (Ni,Yn, . . .  ,Yip)  are  i.i.d.  replicates  of  (Nlf  Vn, . . . ,  Yip) . 

THEOREM  5.3.  In  the  i.i.d.  case  with  Y  left- continuous  with  right-hand  limits,  conditions 
(A1)-(A3)  are  satisfied  if  for  each  j  =  1, . . .  ,p 

£(  sup  |Fx3  (t)|)  <  oo  (5.9) 

and  the  pxp  matrix  K(t) ,  notu  defined  by  K}k{t)  =  EYij(t)Ylk(t) ,  satisfies  condition  (A3). 


5.2.  Integrated  histogram  sieve  estimator 


The  histogram  sieve  estimator  is  defined  in  the  setting  of  the  counting  process  model 
(5.1)  by  a(t)  =  ar ,  for  t  eJr,  where  ar  =  (ary)  is  the  p  x  1  vector  give  by  ar  =  J D~1Cr, 
and  Cr  is  the  p  x  1  vector 

Cr=  [  Y'(s)dN(s), 

J  It 


Dr  is  the  pxp  matrix 


Dr  =  [  Y'{s)Y{s)ds. 

J  It 

Note  the  similarity  between  the  integrated  histogram  sieve  estimator 


» 

% 

I1 


and  Aalen’s  least  squares  estimator.  However  A  is  not  (Jt)  -  adapted  and  the  derivation  of 
its  asymptotic  distribution  theory  is  much  more  difficult.  The  following  result  of  McKeague 
(1987)  gives  the  asymptotic  distribution  of  A. 


% 


CONDITIONS. 

(Cl)  (Asymptotic  stability).  For  j,  k,  i  =  l,...,p  there  exist  function  K3k  and  Rjkt 
defined  on  [0, 1]  such  that 

sup  | Kjk{t)  -  K]k(t) |  =  Op(n~$) 
u|o,i] 

sup  \R]kt{t)  -  K]kt[t)\  ^  0. 

HM 

(C2)  (Lindeberg  condition).  Same  as  (A2). 

(C3)  (Asymptotic  regularity  condition).  The  p  x  p  matrix  K(t )  =  (Ayfc(t))  is  nonsin¬ 
gular  for  all  t  e  [0, 1] . 

(C4)  The  functions  atj,Kjk,Rjkt  j,h,i  =  l,...,p  are  Lipschitz. 

THEOREM  5.4.  If  conditions  (Cl)-(C4)  hold,  ENi(l)  <  oo  for  all  i  >  1, 
v/nmaxffi,.,.,^)  — ♦  0,  and  n  min(£i, ...  ,£<j)  — ►  oo  then 

y/n(A  —  A)  m  in  C[0,  l]p 


where  m  is  the  p-variate  continuous  Gaussian  martingale  of  Theorem  5.1. 


and 

1  n  [ 

Ruvw(t)  -  —  («)<*«,  for  t  e  Ir. 

T  »=i  J Ir 

In  the  survival  analysis  context  Gyy  simplifies  to  (2.2).  It  can  be  shown,  under  the  hypothe¬ 
ses  of  Theorem  5.4,  that  Gyy  is  a  uniformly  consistent  estimator  of  Gyy  (McKeague,  1987). 
It  follows  that  (2.1)  is  an  asymptotic  100(1  —  a)%  confidence  band  for  Ay 

5.3.  Weighted  least  squares  estimators 

Neither  Aalen’s  least  squares  estimator  nor  the  integrated  histogram  sieve  estimator 
is  an  optimal  estimator.  In  the  case  of  a  single  covariate  the  Nelson-Aalen  estimator  is 
optimal.  However,  if  weighted  least  squares  is  used  instead  of  ordinary  least  squares,  it  is 
possible  to  improve  th  performance  of  Aalen’s  least  squares  and  the  integrated  histogram 
sieve  estimators.  Specifically,  consider  the  weighted  least  squares  estimator 

A(i)  =  f  (F/(s)W(s)y(s))-1F'(s)W(s)d7V(s)  (5.10) 

Jo 

where  W(t)  is  the  n  x  n  diagonal  matrix  having  ith  diagonal  entry 

Wi(0  =  MOfyW)  >  i  =  1 , . . . ,  n 
3  =  1 

where  dy  is  a  predictable  estimator  of  ay  and  1/0  =  0.  Note  that  A  is  an  estimator  of  the 
form  (5.2).  In  the  case  of  a  single  covariate  A  coincides  with  the  Nelson-Aalen  estimator 
and  no  companion  estimator  di  is  needed.  For  more  than  one  covariate  a  predictable 
estimator  dy  for  ay  can  be  constructed  by  kernel  smoothing  of  Aalen’s  (ordinary)  least 
squares  estimator  A  as  follows.  Let 

MO  =  r  f 

On  J 0  On 

where  K  is  a  continuous  function  having  support  [0,1]  and  integral  1,  and  6n  >  0  is  a 
bandwidth  parameter  which  tends  to  zero  as  n  — *  oo.  Under  conditions  on  ay,fc  and  hn 
uniform  consistency  of  dy  can  shown  using  similar  techniques  to  those  of  Ramlau-Hansen 
(1983)  for  the  Nelson-Aalen  estimator.  Finally,  along  the  lines  of  the  proof  of  Theorem 
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5.1,  it  can  be  shown  under  mild  regularity  conditions  that  •v/n(A  -  A)  converges  weakly  in 
D[0,  l]p  to  a  p-variate  Gaussian  martingale  m  with  m(0)  =  0  and  covariance  matrix 


Cov(m(t),m(t))  =  f  V  l(s)ds 
Jo 

where  (in  the  i.i.d.  case)  V  (s)  is  the  pxp  matrix  with  entries 


Ff  n,-(Qr.*(0  i 

,l) 


The  weighted  version  of  the  histogram  sieve  estimator  and  its  integrated  counterpart  can 
be  developed  in  an  analogous  fashion.  The  weighted  least  squares  approach  will  be  treated 
at  length  in  a  subsequent  paper. 


APPENDIX 


PROOF  OF  THEOREM  5.1.  First  split  y/n(A[t)  -  A(f))  into  four  parts  using  (5.1),  (5.4) 
and  (5.5): 


where 


1 

[  K~1(s)  Y'(s)dM(s) 

Jo 

(A.l) 

1 

f  (J{s)-l)K~1{s)Y,{s)dM{s) 

Jo 

(A- 2) 

i 

+  A 

f*  J{s)[k~l{s)  -  K~l{s )]  Y'{s)dM{s) 

Jo 

(A3) 

+  y/n 

. 

f\j(s)-l)dA(s) 

' 0 

(AA) 

J(t)  = 

I{K{t)  is  invertible). 

{A- 5) 

The  local  martingales  are  local  square  integrable  martingales  on  the  time  in¬ 

terval  [0,1]  and  their  predictable  quadratic  variation  processes  are  given  by 


<  Mi,  Mj  >  (t)  =  t  Xi(s)ds  and  <  Mi,  My  >  (f)  =0,  i  ±  j 
Jo 


IT*  V*  IT*  \r*V*'J*V* I7»  VW\ 


where 


A.(t)  =  Y1  0,(1)  Yij(t). 


Now  consider  (A.l)  which  can  be  written  as  X^"*(l)  =  (xjn^  (1), . . . ,  X':.r' 1  (())'  where 

Xjn,(0  =  £  f  H$\s)iMM 

1=1  ■'» 

Condition  (A3)  implies  that  the  functions  (K~1(t))Jk  are  bounded  so  that  is  a  local 

square  integrable  martingale.  Also,  from  (A. 6) 

<  x^x'-1  >  (i)  =  f  Yi 

Jo  i= i 

so  that  using  the  asymptotic  stability  condition  (Al),  for  all  j,k  and  t 

<x]"),xi"»  >(!)  4  <?/„(!) 

as  n  — ►  oo.  Next,  the  boundedness  of  the  functions  {K~l(t))jk  and  condition  (A2)  imply 
the  following  Lindeberg  condition:  for  all  j  and  e  >  0 

f1 'Eni?(‘)2H‘)H\H$,(t)\>t)dt  4  o 

i=l 

as  n  — ►  oo .  By  Rebolledo’s  Central  Limit  Theorem  for  local  square  integrable  martingales 
in  the  form  given  by  Andersen  and  Gill  (1982,  Theorem  1.2)  it  follows  that  m  in 

£>[o,i]p. 

Note  that  by  conditions  (Al)  and  (A3), 


P(K(t)  is  invertible  for  all  t  e  [0,1])  — ♦  1 


oo.  This  shows  that  terms  (A. 2)  and  (A. 4)  converge  uniformly  to  zero  in  probability 


•-»y 


Finally  consider  (A. 3)  whose  jth  component  is  given  by 


2i”lW  =  4r  E  / 

'/n  Jo  “ 

which  is  a  local  square  integrable  martingale.  By  Lenglart’s  (1977)  inequality,  for  each 
e  >  0,  rj  >  0 


P(  sup  |zjn)(!)|  >  s)  <  ^  +  J>(<  zjn\zj.">  >  (1)  >  ,). 

tc[0, 1}  J  J 


(A- 8) 


Now  evaluate  <  zjn\zjn^  >  using  (A.6)  to  obtain 


<  >  (1)  =  i  E  ['  J(s){El^_1(<)  - 

n  <=1  •/0  fc=l 

<  p  E{  sup  ||/f-‘(!)  -  jr-‘(0UI}{-  E  ['  ViK*)  Xi W*J.  (A.9) 
t=,  <<IM  "  St  Jo 

By  the  componentwise  continuity  of  the  matrix  inverse  operation  and  conditions  (Al),  (A3) 
we  have  for  the  first  term  in  (A.9) 


sup  |[K  l{t)-K  l(t)]/*|  ^  0. 

tc [0, 1] 

Also,  using  condition  (Al),  for  the  second  term  in  (A.9) 


(A.10) 


/  yifc(s)  ^i{s)ds  ^  /  Rkkj{t)a3{t)dt.  (A. 11) 

”  t  =  l  J°  ]  =  l 

Combining  (A.8)-(A.ll)  we  conclude  that  (A. 3)  converges  uniformly  to  zero  in  probability 
as  n  — »  oo .  This  completes  the  proof  of  the  theorem,  g 

PROOF  OF  THEOREM  5.2.  Fix  j  and  k.  Let  L(t)  and  L(t)  be  the  1  x  p  vectors  with 


components 


Lu(t)  =  E  E  X*~(t)(K-l(t)b.(K-l(t))k% 


u— 1 w=l 


Lu(t)  -XL  X*™(t)(K-l{t))iv(K-l(t))kw. 


v=  1  w=  1 


Using  (5.1),  (5.4)  and  (5.5)  we  can  write 


G3k{t)~Gjk{t)  =  -  [  L(s)K~1(s)  Y,(s)dM(s)  (A.  12) 

n  Jo 

+  f  J(s)(L[s)  —  L(s))dA(s)  (A. 13) 

Jo 

+  /  (J(s)  -  l)L(s))dA(s).  (A.  14) 

Jo 

The  term  (A. 14)  is  dealt  with  using  (A. 7).  From  (A. 10)  and  the  asymptotic  stability  con¬ 
dition  (Al) 

sup  \Lu{t)  -  L{t)\  0, 

“[0,1] 

which  deals  with  (A. 13).  Finally,  using  Lengiart’s  inequality  it  is  seen  that  (A. 12)  also 
converges  uniformly  in  probability  to  zero.  | 

PROOF  OF  THEOREM  5.3.  Apply  the  Strong  Law  of  Large  Numbers  in  D\ 0,  l]  (Rao, 
1963)  in  the  reversed  time  direction.  | 
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Key  to  Figures  1-8 


Thick  lines  =  integrated  histogram  sieve  estimates. 

Regular  dashed  lines  =  95%  pointwise  confidence  limits. 

Thin  lines  =  95%  confidence  bands. 

Irregular  dashed  lines  =  true  integrated  hazard  functions  (in  Figures  1-4). 
Lines  marked  with  boxes  =  estimates  of  Pierce  and  Preston  (in  Figures  5-8) 
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Figure  4.  Heavy  censoring  (68%  ),  n  =  2000,  d  =  64.  (a)  Covariate  1.  (b)  Covariate  2. 
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Figure  0.  10-19  Years  of  Age  at  Exposure,  (a)  Background  for  males,  (b)  Background  for 
females,  (c)  Excess. 
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20 .  ABSTRACT 

Cox’s  (1972)  proportional  hazards  model  has  so  far  been  the  most  popular  model  for 
the  regression  analysis  of  censored  survival  data.  However,  the  additive  risk  model  of  Aalen 
(1980)  provides  a  useful  and  biologically  more  plausible  alternative  when  large  sample  size 
makes  its  application  feasible.  Let  A(t)  =  A(t,  Y)  be  the  hazard  function  for  a  subject  whose 
covariates  are  given  by  Y  =  (Yi, . . . ,  Yp)'.  Aalen ’s  model  stipulates  that  A(f)  =  Y'a{t ), 
where  a  =  (ai,. ..  ,ap)#  is  an  unknown  vector  of  hazard  functions.  This  paper  discusses 
inference  for  ai,...,ap  based  on  continuous  and  grouped  data.  Asymptotic  distribution 
results  are  developed  using  the  theory  of  counting  processes  and  used  to  provide  confidence 
bands  for  the  cumulative  hazard  functions.  The  method  is  applied  to  data  on  the  incidence 
of  cancer  mortality  among  Japanese  atomic  bomb  survivors. 
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